Add LongBench V2 benchmark#249
Conversation
RobotSail
left a comment
There was a problem hiding this comment.
Thanks for the PR @eshwarprasadS !
The PR has all of the right ideas, there are just a few minor changes that you'll want to make which I've outlined in this review. Once we've addressed those, this should be good to merge
src/instructlab/eval/longbench.py
Outdated
| ) / 2 | ||
|
|
||
| # Calculate overall score | ||
| all_scores = [v for k, v in eval_results.items() if k != "overall_score"] |
There was a problem hiding this comment.
Why do we check if k != "overall_score"? We shouldn't have set this key yet
|
@eshwarprasadS It looks like you may need to rebase your changes |
|
@mergify rebase |
Signed-off-by: eshwarprasadS <eshwarprasad.s01@gmail.com>
Signed-off-by: eshwarprasadS <eshwarprasad.s01@gmail.com>
…-cuda extras Signed-off-by: eshwarprasadS <eshwarprasad.s01@gmail.com>
…y served openai-compatible model endpoints Signed-off-by: eshwarprasadS <eshwarprasad.s01@gmail.com>
… name parameter Signed-off-by: eshwarprasadS <eshwarprasad.s01@gmail.com>
Signed-off-by: eshwarprasadS <eshwarprasad.s01@gmail.com>
Signed-off-by: eshwarprasadS <eshwarprasad.s01@gmail.com>
Signed-off-by: eshwarprasadS <eshwarprasad.s01@gmail.com>
Signed-off-by: eshwarprasadS <eshwarprasad.s01@gmail.com>
Signed-off-by: eshwarprasadS <eshwarprasad.s01@gmail.com>
Signed-off-by: eshwarprasadS <eshwarprasad.s01@gmail.com>
✅ Branch has been successfully rebased |
|
This pull request has merge conflicts that must be resolved before it can be |
|
@eshwarprasadS It looks like you have a few merge conflicts that need to be fixed. Once those are solved, we can merge this. |
Signed-off-by: Eshwar Prasad Sivaramakrishnan <eshwarprasad.s01@gmail.com>
Adding LongBench to eval options,
Install extras with:
pip install instructlab-eval[longbench]Uses VLLM backend for serving the model for generation
Runs like so:
Output json looks like so: